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The  Role  of  Exchangeability  in  Inference 

L) .  V.  Lindley 
University  College  London 


Melvin  R.  Novick 
The  University  of  Iowa 


1.  Introduction 

This  paper  presents  what  we  believe  to  be  a  useful  way  of  looking 
at  problems  of  statistical  inference.  The  thesis  is  in  three  parts. 
First,  it  is  argued  that  inference  is  a  process  whereby  one  passes 
from  data  on  a  set  of  units  to  statements  about  a  further  unit.  Stan¬ 
dard  procedures  concentrate  on  the  data  and  tend  to  ignore  the  con¬ 
nection  with  the  case  to  which  the  inference  is  to  be  applied.  Second, 
we  show  how  this  connection  can  be  established  using  either  de  Finetti's 
idea  of  exchangeability  or  Fisher's  concept  of  a  subpopulation.  Third, 
in  making  the  connection  it  is  important  to  use  the  appropriate  proba¬ 
bility,  since  there  are  many  instances  where  statisticians  have  used 
what  we  argue  is  the  wrong  value. 

The  paper  begins  with  some  striking  examples.  There  then  follows 
a  section  developing  some  technical  ideas  which  are  applied  to  resolve 
the  paradoxes  raised  by  the  examples.  Topics  discussed  include  analysis 
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of  variance  and  covariance,  contingency  tables,  and  calibration.  Some 


comments  on  randomization  are  also  Included.  Although  we  do  not  enter 
into  controversies  over  statistical  methods  of  inference,  the  paper 
does,  we  believe,  give  support  to  the  persona  1 1st ic  view  by  demonstrating 
its  usefulness. 


2.  Simpson's  Paradox 

Consider  the  data  in  Table  1  where  40  patients  were  Riven  a  treat¬ 
ment,  T,  and  40  assigned  to  a  control,  T.  The  patients  either  recovered, 
R,  or  did  not,  R.  We  are  not  considering  small-sample  problems  so  that 
the  reader  can  if  he  wishes  imagine  all  the  numbers  multiplied  by  10,000 
say.  It  is  then  clear  that  the  recovery  rate  for  patients  receiving  the 
treatment  at  50X  exceeds  that  for  the  control  at  40X  and  the  treatment 
is  apparently  to  be  preferred.  However,  the  sex  of  the  patients  was  also 


Insert  Table  1  about  here 


recorded  and  Table  2  gives  the  breakdown  of  the  same  80  patients  with 
sex,  M  male  or  M  female,  included.  It  will  now  be  seen  that  the  re¬ 
covery  rate  for  the  control  patients  is  10X  higher  than  that  (or  the 
treated  ones,  both  for  the  males  and  the  females.  Thus,  what  is  good 
for  the  men  is  good  for  the  women,  but  bad  for  the  population  as  a 
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whole.  We  refer  to  this  as  Simpson's  (1951)  paradox,  though  it  occurs 
in  Cohen  and  Nagel  (1934).  In  appendix  I  we  describe  the  situation 
mathematically  and  show  that  the  paradox  can  only  arise  if,  R  and  T 
being  positively  associated,  M  is  positively  associated  both  with  R 
and  with  T.  This  is  exactly  what  has  happened  here:  The  males  have 
been  mostly  assigned  to  the  treated  group,  the  females  to  the  control; 
perhaps  because  the  doctor  distrusted  the  treatment  and  so  was  reluc¬ 
tant  to  give  it  to  the  females  where  the  recovery  rate  is  much  lower 
than  for  males.  Alternatively  expressed,  treatment  and  sex  have  been 
confounded.  Nevertheless  it  comes  as  a  surprise  to  most  people  to 
learn  that  confounding  can  actually  reverse  an  effect;  here  from  +10% 
to  -10%. 


Insert  Table  2  about  here 


An  important  problem  posed  by  the  paradox  is  this:  Given  a  person 
of  unknown  sex  would  you  expect  the  control  or  the  treatment  to  be  the 
more  effective?  (If  having  an  unknown  sex  seems  odd  replace  M  and  M  by 
a  dichotomy  that  is  difficult  to  determine,  such  as  a  genetic  classifica¬ 
tion.)  The  answer  seems  clear  that,  despite  Table  1,  the  control  is  bet¬ 
ter.  If  so,  then  this  warns  us  to  be  very  careful  in  using  results  like 
those  in  Table  1  to  draw  the  opposite  conclusions  for  could  there  not 
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exist  a  factor,  here  sex,  which  reversed  the  conclusion?  But  is  the 
answer  so  clear?  Keeping  the  numbers  the  same,  imagine  data  with  T 
and  T  replaced  by  white  and  black  varieties  of  a  plant  respectively, 
and  R  and  R  corresponding  to  high  and  low  yields;  the  confounding 
factor  being  whether  the  plant  grew  tall,  M,  or  short,  M.  The  white 
variety  is  10%  better  overall,  but  10%  worse  among  both  tall  and  short 
plants.  In  this  case  the  white  variety,  T,  seems  the  better  one  to 
plant;  whereas  T,  the  control,  was  intuitively  preferred  in  the  medical 
situation. 

The  problem  addressed  in  this  paper  is  that  of  providing  a  formal 
framework  within  which  such  problems  can  systematically  be  resolved.  In 
the  next  section  we  describe  some  mathematical  ideas  that  are  used  in 
subsequent  sections  to  discuss  Simpson's  paradox  and  related  problems. 

3.  Exchangeability  and  Recognizable  Subpopula t ions 

Throughout  the  paper  we  shall  use  probability  in  the  sense  of  a 
number  which  a  person,  conveniently  called  You,  would  attach  to  the 
truth  of  an  event,  A,  were  he  to  be  informed  of  the  truth  of  another 
event,  B.  It  is  termed  the  probability  of  A  given  B  and  written 
p(A|B) .  Sometimes,  reference  to  the  conditioning  event  B,  being  un¬ 
derstood,  is  omitted  and  we  refer  to  Your  probability  of  A,  p(A). 
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This  is  not  a  frequency  concept  but  its  relation  to  relative  fre¬ 
quencies  will  be  considered  later. 

Let  X  and  Y  be  two  random  variables,  each  of  which  may  be  multi¬ 
dimensional.  Recovery  and  yield  are  two  examples  from  Section  2.  Con¬ 
sider  next  a  number  of  similar  things  termed  units;  in  the  examples 
of  Section  2  they  are  patients  and  plants.  For  the  i*"^  unit,  let  the 

random  variables  X  and  Y  assume  values  X.,  Yt.  While  these  are  un- 

i  i 

known  to  You,  they  will  be  referred  to  as  random  quantities  and  You 
will  have  probabilities  for  them.  As  soon  as  You  observe  them  they 
become  numbers  x^,  y ^  and  the  randomness  (and  hence  the  probability 
notion)  disappears. 

A  number  n  of  units  is  termed  exchangeable  in  X  if  the  joint 
probability  distribution  p(X^,  X,,  ....  X^)  is  invariant  under  per¬ 
mutation  of  the  units.  A  further  unit  is  exchangeable  in  X  with  the 
set  if  all  (n  +  1)  units  are  so  exchangeable.  In  the  medical  example 
the  n  »  AO  patients  who  received  the  treatment  might  be  judged  exchange¬ 
able  in  recovery,  and  a  further  patient  might  be  judged  exchangeable 
with  the  40  were  he  to  receive  the  treatment. 

A  number  of  units  is  termed  exchangeable  in  X,  given  Y  =  y,  if  the 
joint  conditional  distribution  p(X^,  X^,  ....  X^jY^  “  y.  all  i)  is 
invariant  under  permutation  of  the  units.  If  this  holds  for  all  y,  we 
refer  to  exchangeability  in  X,  given  Y.  A  further  unit  is  exchangeable 
in  X  given  Y  *  y  with  the  set  if  the  enlarged  set  of  all  (n  +  1)  units  is 
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so  conditionally  exchangeable.  In  the  medical  example,  the  40  patients 
who  were  male  (y)  might  be  judged  exchangeable  in  recovery  (conditional 
on  their  sex).  If  the  same  holds  for  the  40  females  then  the  80  patients 
are  exchangeable  in  recovery  given  sex.  This  is  not  the  only  possible 
definition  of  conditional  exchangeability;  another  form  is  discussed  in 
appendix  2.  The  form  given  here  is  adequate  for  the  applications  consi¬ 
dered  in  the  present  paper. 

Consider  the  case  where  X  refer#  to  an  event  such  as  recovery,  and 
so  only  takes  two  values,  R  and  R.  Suppose  n  units  in  the  data  and  a 
further  unit  are  exchangeable  in  X.  Then  for  inference  purposes,  wo 
may  be  interested  in  the  possible  value  of  X  in  unit  (n  +  1)  given  the 
values  of  X  in  the  n  units.  If  n  is  large,  the  probability  that  the 
event  will  occur  in  the  new  unit  is  simply  the  frequency  with  whicli 
the  event  has  occured  in  the  n  units.  A  rigorous  demonstration  of 
this  requires  de  Finetti's  theorem  on  the  structure  of  exchangeable- 
sequences;  however  the  result  is  intuitively  obvious.  It  is  impor¬ 
tant  because  it  provides  a  link  between  the  view  of  probability  adopted 
here  and  the  frequency  viewpoint.  We  shall  use  the  term  propensity 
(or  c ha nc e )  to  describe  the  frequency  and  write  P(A)  for  the  propensity 
of  an  event  A.  The  result  just  mentioned  may  be  abbreviated  to  p(A)  3 
P(A),  equating  the  probability  and  propensity.  Notice  that  the  con¬ 
dition  of  exchangeability  has  been  omitted  from  the  notation.  The 


concept  extends  to  conditional  exchangeability  in  X  given  Y  when  we 
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will  have  p(A|Y  *  y)  “  P(A i Y  -  y),  Che  propensity  of  A  among  those 
units  having  Y  -  y.  In  the  medical  example.  It  a  judgment  of  exchange¬ 
ability  in  recovery,  given  sex  and  treatment,  is  made  then  the  proba¬ 
bility  that  another  male  will  recover,  given  tin-  treatment.  Is,  from 
Table  2,  the  propensity  18/30  "  0.6. 

The  ideas  of  probability  and  exchangeabi 1 i ty  just  mentioned  are  duo 
to  de  Kinettl  11974).  Fisher  uses  the  term,  probability,  somewhat  ditter- 
ently  and  in  conjunction  with  the  concept  of  a  population.  We  pause  to 
consider  these  ideas  and  their  interconnections.  Fisher's  ideas  are  most 
clearly  expressed  In  his  last  hook  tl9bb),  in  particular  in  this  section 
from  p.  33: 

"This  fundamental  requirement  [of  no  recognizable  subset]  for 
the  applicability  to  Individual  cases  of  the  concept  of  classi¬ 
cal  probability  shows  clearly  the  role  of  subjective  ignorance, 
as  well  as  that  of  objective  knowledge  in  a  typical  probability 
statement.  It  lias  often  been  recognized  that  any  probability 
statement,  being  a  rigorous  statement  involving  uncertainty, 
has  less  factual  content  than  an  assertion  o’  certain  fact 
would  have,  and  at  the  same  time  has  more  tactual  content  than 
a  statement  of  complete  ignorance.  The  knew  ledge  required  for 
suet*  a  statement  refers  to  a  well-defined  aggregate,  or  popu¬ 
lation  of  possibilities  within  which  the  limiting  frequency 
ratio  must  be  exactly  known.  The  necessary  ignot  anoe  is  speci¬ 
fied  by  our  inability  to  discriminate  any  of  the  different  suh- 
aggrey  ites  having  different  limiting  frequency  ratios,  such  as 


must  always  exist." 
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The  concept  of  a  population  of  units  is  close  to  saying  that  those 
units  are  exchangeable.  The  Identification  of  a  sub-aggregate,  or  sub- 
population,  is  related  to  conditional  exchangeability  (of  X,  given  Y  »  y) , 
the  dlacr iminat ion  Fisher  refers  to  being  effected  by  Y.  Even  the  apparently 
dissimilar  notions  of  probability,  of  Fisher  and  de  Finelti,  are  not  unre¬ 
lated:  the  limiting-frequency  (or  propensity)  being  relevant  as  a  proba¬ 

bility  statement  by  You  whenever  exchangeability  is  present.  Moreover,  the 
relevant  frequency  is  determined  by  recognizing  the  appropriate  sub-popu¬ 
lation,  or  type  of  conditional  exchangeability.  On  the  other  hand,  there 
are  two  important  differences  between  the  concepts.  First,  exchangeability 
of  units  refers  explicitly  to  a  random  quantity,  whereas  a  population  does 
not.  Thus  the  units  might  be  exchangeable  in  X,  but  not  in  Y,  or  even  in 
(X,  Y).  Second,  no  guidance  seems  to  be  given  on  how  to  recognize  whether 
an  individual  unit  belongs  to  a  population,  whereas  exchangeability,  being 
a  statement  about  units,  does:  one  unit  cannot  disturb  a  limiting  frequency, 
whereas  it  can  affect  exchangeability.  We  attach  considerable  importance 
to  this  last  point  because  of  our  view  of  inference  as  a  passage  from  data 
to  a  unit  and  not,  except  as  an  intermediary,  to  a  parameter. 

Perhaps  the  most  important  difference  between  the  two  notions  is  that 
de  Finetti  gives  us  a  precise  definition  that  we  can  operate  with;  whereas 
Fisher  conveys  only  a  brilliant  suggestion  that  suffers  from  vagueness  in 
individual  appl icat ions .  The  way  we  prefer  to  regard  the  situation  is 
that  exchangeabi li ty  makes  precise  the  concepts  of  populations  and  sub¬ 
populations;  and  we  will  often  find  it  convenient  to  use  Fisherian  language. 
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A  major  task  in  Inference,  as  discussed  below,  is  the  identification  of 
the  appropriate  population  to  which  an  individual  belongs.  Thus  in  the 
medical  example  we  can  recognize  a  subpopulation  of  treated  patients, 
namely  that  defined  by  sex;  whereas,  to  anticipate  the  argument  below,  the 
total  population  is  relevant  in  the  agricultural  example.  To  simplify: 
practitioners  seem  to  preler  the  language  of  populations;  theoreticians, 
that  of  exchangeabi 1 ity . 

(Fisher  used  the  concept  of  subpopulation  in  inference  through  the 
concept  of  an  ancillary  statistic.  This  may  be  a  misuse  of  the  concept — 
it  certainly  is  from  the  Bayesian  view--but  that  does  not  affect  the  validity 
and  usefulness  of  the  basic  notion.  He  also  used  probability  to  mean  "a 
fraction  of  a  set"  and  denied,  despite  the  above  quotation,  some  aspects 
of  it  as  a  limiting  frequency.  This  point  is  discussed  by  Savage  (,1976,  sect. 

9.3).  Again,  this  does  not  invalidate  our  arguments  that  follow. 

We  now  apply  these  ideas.  In  Section  9  wo  deal  with  two  random 
variables  only,  starting  with  the  special  case  of  events  and  later 
generalizing.  In  Sections  j  and  6  we  discuss  three  random  variables, 
where  new  phenomena  enter,  and  Simpson's  paradox. 

9.  Two  Random  Variables 

The  discussion  in  this  section  owes  much  to  Meehl  and  Rosen  (.1955) 
and  is  included  as  an  introduction  to  the  ideas  that  are  then  used  in 


Sections  s  and  6  for  the  three-variable  situation.  It  is  convenient 


IT 


to  work  in  terms  of  examples  and  we  begin  with  one  involving  two  events. 
Patients  were  classified  according  to  whether  or  not  they  had  a  disease, 
D,  and  whether  their  reaction  to  a  test  was  positive  or  negative.  Pos¬ 
sible  results  on  n  -  100  patients  are  given  in  Table  3.  To  a  statisti¬ 
cian,  this  is  a  2  by  2  contingency  table  and  he  could  employ  many  of  the 


Insert  Table  3  about  here 


techniques  devised  for  such  tables.  In  particular,  he  might  regard  D 
and  D  as  two  hypotheses  and  +  and  -  as  data  appropriate  for  distinguish¬ 
ing  between  the  two.  We  argue  that  typically  the  inference  problem  is 
not  confined  to  the  n  ■  100  units  (patients)  in  the  data  base  but  ex¬ 
tends  to  include  other  units:  For  example,  those  who  have  responded 
positively  to  the  test  but  are  not  known  to  have  the  disease.  Connec¬ 
tion  between  the  new  patient  and  the  data  base  can,  we  argue,  con¬ 
veniently  be  described  in  terms  of  exchangeability  or  populations  and  we 
explore  various  possibilities. 

One  possibility  is  to  regard  the  new  patient  as  exchangeable  in 
both  variables,  disease  and  test,  with  those  in  the  table.  Alterna¬ 
tively  expressed,  the  101  patients  are  a  random  sample  from  a  popu¬ 
lation  in  both  events.  If  so,  we  can  relate  probabilities  for  the 
new  patient  to  propensities  in  the  data  base  and,  for  example,  de¬ 
clare  p(D |+)  «  P(D |+)  =  0.4  so  that  a  patient  responding  positively 
has  a  probability  of  0.4  of  having  the  disease. 
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Another  possibility  Is  to  lodge  t  he  new  patient  rxi' hangeub  1  e  in 
I  est  result  Riven  t  be  disease  e  lasslt  teat  Ion,  This  might  be  appro- 
pi  late  il  ,  t  »>r  example,  t  be  It'll  pat  tents  in  t  be  data  base  were  I  rom 
one  eltv,  fbc  new  patient  front  another  ettv,  and  It  was  fell  that 
t  be  disease  propensity  might  dll  let  between  cities,  bill  t  lie  test  be 
haved  slmilat  Iv  In  the  twit  plaees.  ill  this  ease  all  one  ran  inlet  bv 
exchangeability  is  p(-t|pl  -  0.8  and  p(+|ll)  -  0.  the  eorrospond  ing 
propensities.  Alternatively  expressed,  two  snbpopn 1  at  i ons  can  be  recog¬ 
nized,  e  or  r  esponu  i  hr  to  !'  and  I'.  A  new  point  now  arises.  In  the  in- 
stance  of  a  patient  with  known  test  tesult  but  unknown  disease  classi¬ 
fication  we  ieiptire  pil'jO.  This  eannot  be  obtained  I  rom  the  data 
alone.  With  Haves  title 

p(I)h)  ■'  p(f!D)p(l>)  "  O.BpUl), 

but  ptl'1  is  still  rcqulied.  Without  the  Judgment  of  oxchangoah i  1  it v  in 
I'  this  eannot  be  obtained  1 rom  the  data.  Statisticians  have  bypassed 
this  problem  bv  eoiillnlnR  their  attention  to  the  values  obtainable  I rom 
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The  positive  result  and  the  disease  are  positively  assoeiated  so 
that  +  favors  11;  -,  1>.  The  st.it  1st  to  tan's  error  tales  are  therefore 
p(-|p)  »  0..’  and  p(+|l>)  -  0.J  and  the  tost  appears  quite  usetul  as  a 
diagnostic  for  the  disease.  U'i  lie  p  y  P  T  -  p,  then  bv  Haves  rule  in  odds 
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p(l)  4  )  _  8  p _ 

p  (n  f)  i  f  -  V 

an. I,  lor  a  negative  tost  result, 

i’M-1  -  l 

|>(D  -)  7  l  -  p  * 

However  for  p  *  3/11  »  0.27,  both  these  expressions  are  less  than  unity  so 
that  If  p  satisfies  this  inequality  the  test  result,  on  its  own,  Is  use¬ 
less.  Equally  tor  p  >  7/9  ~  0.78  both  are  greater  than  one  a  ml  again  the 
test  is  of  no  value.  With  the  judgment  of  full  exchangeability  and  no 
subpopu lat ton  identification  p  “  0.2,  the  propensity  foi  the  disease, 

1' ( D ) .  and  the  test  is  useless.  The  statistician's  argument  is  therefore 
incomplete  because  it  ignores  the  disease  probability  and  uses  the 
wrong  probabilities:  for  example,  p(f|p)  Instead  of  p(D|+). 

Another  possible  exchangeability  judgment  in  that  of  exchangeability 
in  disease  given  test  result.  This  seem;;  an  unlikely  one  but  David  fid, '7) 
has  given  a  careful  discussion  of  how  this  might  happen.  When  it  does, 
the  required  p(l)|+)  can  be  equated  direct  ly  to  l’ill|+)  ”  0.  ■<  and  Haves  tulc 
does  not  have  to  be  invoked. 

There  are  other  possibilities.  For  example.  You  may  judge  the  new 
patient  exchangeable  in  test  result  given  D,  but  not  given  D.  A  possible 
reason  for  this  is  that  You  mav  tool  that  t  lie  disease  is  the  same  in  the 


two  cities  but  that  patients  without  the  disease  have  dlfleront  disease 


patterns  in  the  t wo  places;  sav  an  alternative  tv'  D  being  ,  which  is 
common  in  one  city  but  tare  in  the  either.  In  this  ease  only  p(+|Dl  "  0.8 
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v- a n  be  found  from  the  propensities  In  the  data  base,  so  that  both  p(+|f>) 
and  p(P)  are  required  from  elsewhere  before  p(l>|  +  )  can  be  assessed  by 
Daves  rule;  alternatively  It  may  be  assessed  directly. 

We  learn  three  things  from  this  study.  First,  that  there  are  vari¬ 
ous  forms  of  connection  between  the  data  and  a  new  unit  for  which  an  in¬ 
ference  Is  to  he  made.  Second,  that  these  connections  mav  he  made  using 
exchangeabi 1 i t v .  third,  that  cate  needs  to  be  exercised  in  nslng  the 
appropriate  probabll  It v. 

In  those  cases  where  exchangeability  does  not  provide  adequate  con¬ 
nection  for  the  required  probability  to  he  equated  to  a  propensity  it 
will  be  necessary  to  assess  probabilities  using  additional  information 
beyond  that  in  the  data  base.  Foi  example,  we  saw  above  that  11  exchange 
ability  in  test  result  given  disease  class  Is  all  that  is  assumed,  Daves 
rule  required  p(D)  to  he  assessed.  We  mav  have  data  on  the  disease  pro¬ 
pensity  in  the  city  t rom  which  the  new  patient  comes:  ,  If  so,  that  mav 
provide  p(D).  Alternatively,  we  may  merely  feel  that  the  disease  is 
more  common  in  his  city  so  that  plD)  N  P(lt)  ■  0.2  and  some  judgment  will 
have  to  be  used  in  default  of  data. 

Few  additional  points  arise  when  we  pass  I rom  two  events  to  two 
general  random  variables,  X  and  Y.  Again  there  are  various  iorms  ot 
exchangeability  assumption:  in  both  X  and  Y,  In  X  given  Y,  or  in  X 
given  Y  ™  y  for  some  v.  Another  terminology  is  sometimes  used  In  this 
context  besides  that  ot  exchangeability  oi  aubpopu l at  ion,  namely  to 
dose  i  I  be  a  random  variable  as  either  random  or  fixed.  Thus  in  regression 
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o!  V  on  X,  Kond.il  l  and  Sluart  (19b9)  discuss  the  cases  of  X  random,  ,md 
the  more  common  case  of  X  fixed.  Those  correspond  to  the  joint  exchange¬ 
ability  of  X  and  Y,  and  to  that  ol  Y  Riven  X,  respectively.  An  interest 
ing  case  Is  that  of  calibration,  where  X  is  a  precise  measurement  --  per¬ 
haps  the  true  value  —  and  Y  a  simple  but  less  precise  one.  The  usual 
Judgment  is  that  of  exchangeability  in  Y  given  X  but  the  required  proba¬ 
bility  Is  p(X]y  -  y)  --  from  the  imprecise  measurement  it  Is  required  tv' 
evaluate  the  true  value  and  hence  calibrate  the  measurement.  Bayes 
rule  has  then  to  he  invoked  and  it  is  necessary  to  obtain  p(X)  from 
sources  other  than  the  data.  In  Kendall  and  Stuart's  terminology, 

X  is  fixed  yet  has  to  bo  estimated.  Similar  problems  arise  in  dis¬ 
crimination  and  classification  problems  where  X  describes  the  class  ol , 
and  Y  the  measurements  on,  the  units. 

Calibration,  discrimination,  and  classification  all  are  fields  in 
which  the  wrong  probability  has  often  been  used,  particularly  by  statis¬ 
ticians.  It  is  perhaps  worth  pointing  out  that  the  correct  approach 
has  for  long  been  standard  practice  in  some  fields.  Thus  in  educational 
testing  with  X  the  true  score  and  Y  the  observed  score,  exchangeability 
is  invoked  for  Y  given  X,  the  propensity  being  described  by  test  orrot  . 

The  distribution  of  true  score,  X,  in  the  population  is  then  used  tv' 
derive  the  required  distribution  of  X.  given  Y.  The  appropriate  n 
gresston  formula  Is  due  tv'  Kelley  (19231.  Similar  early  examples  occui 
In  actuarial  science  In  connection  with  claim  frequencies;  see,  for  cxampl 


lb 


Whitney  11918)  or  Long ley -Cook  (196^)  who  provides  .1  survey.  Similar  ap¬ 
proaches  arc  used  In  electrical  engineering,  particularly  In  signal 
discrimination,  as  numerous  papers  In  the  proceedings  of  ll'KK  testily. 

S.  Three  Random  Kvents 

We  t  irst  apply  the  lessons  learned  in  Sections  '  and  \  --  namely  t hr 
connection  ot  the  data  with  a  now  unit,  the  judgment  of  exchangeability, 
and  calculation  of  the  appropriate  probability  --  tv'  the  two  examples, 
medical  and  agricultural,  of  Section  •  Consider  in  the  first  case  a 
new  patient,  male,  about  whom  a  decision  has  to  he  made  as  to  whether 
tv<  give  him  the  treatment  or  not .  A  possible  judgment  might  he  ol 
exchangeability  in  recovery,  given  sex  and  treatment,  in  whiclt  case 
p(KjrM)  “  t).(>  and  p (. R | TM )  *  D.7  are  available  by  equating  the  proba¬ 
bilities  and  propensities,  and  consequently  the  treatment  should  be 
withheld.  Alternatively  four  suhpopu 1  at i ons  are  identifiable  as  TM, 

TM,  TM,  and  TM.  The  same  conclusion  would  jio  Id  tor  a  female.  We 
mentioned  In  Section  1  the  possible  case  of  someone  ot  unknown  sex 
(or  perhaps  unknown  genetic  makeup).  We  would  then  need  p (R | T)  wltlch, 
with  exchangeability  ot  the  type  just  assumed,  is  not  available  from 
observed  propensities  in  the  data.  However,  l>v  extending  t  h»*  conversa¬ 
tion  to  include  sex. 


p(r|t)  -  p(r|tm)p(m|t)  »-  p(r|tm)p(m|t) 
-  0. bp (m I t)  +  o.;’p(m|t)  , 
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and  only  p(M|T)  is  required  to  complete  the  analysis.  Without  an  assump¬ 
tion  of  exchangeability  in  sex  given  treatment  this  cannot  be  derived 
from  the  propensities  of  the  datH.  (This  was  I’(m|T)  “  0.75.)  Instead 
You  might  judge  that  the  decision  to  use  the  treatment  or  the  control  is 
not  affected  by  the  unknown  sex,  so  that  M  and  T  are  independent.  In 
default  ot  other  knowledge  You  might  judge  the  new  patient  to  he  ex¬ 
changeable  in  sex  with  the  rest  of  the  population,  where  the  propensity 
to  be  male  is  about  1/2.  Hence  p(M|T)  “0.5  and  p(R|T)  “  0.4.  A  simi¬ 
lar  calculation  for  the  control  gives  p(Kjl')  -  0.5  and  the  control  is 
preferred  for  a  person  ol  unknown  sex.  (Once  M  and  T  have  been  judged 
independent,  the  male  propensity  is  irrelevant  to  the  10*  drop  in  re¬ 
covery  rate  if  the  treatment  Is  applied.) 

The  above  Judgment  of  exchangeabil ity  —  in  K,  given  treatment  and 
sex  —  or  the  identification  of  the  appropriate  four  subpopulat ions,  is 
an  expression  ot  Your  belief  that  treatment  and  sox  cause  the  recovery 
rate  to  have  a  certain  value.  In  this  view,  cause  is  a  judgment  by  You, 
that  if  this  happens  then  that  wilt  randomly  follow.  In  the  agricultural 
example  the  causation  pattern  is  likely  to  be  different.  (Remember ,  treat¬ 
ments  are  replaced  by  varieties;  sex  by  height;  and  recovery  bv  yield.) 
There  the  yield  and  height  are  a  result  of  the  variety  planted,  so  that 
the  exchangeabil ity  is  in  yield  and  height,  given  variety.  Hence,  the 
propensities  of  Table  2  now  provide  p(RM]T),  etc.,  the  joint  distributions 
of  yield  and  height,  given  variety.  In  particular,  You  have  the  margins 
p(R | T)  and  p (R | T)  direct  from  Table  1:  their  values  arc  respectively  0.5 
and  0.4  and  the  white  variety,  T,  is  preferred.  Here  only  two  subpopula¬ 


tions  are  Identified. 


In  the  last  paragraph  the  concept  of  a  "cause"  has  been  introduced. 

One  possibility  would  be  to  use  the  language  of  causation,  rather  than 
that  of  exchangeability  or  identification  of  populations.  We  have  not 
chosen  to  do  this;  nor  to  discuss  causation,  because  the  concept,  although 
widely  used,  does  not  seem  to  be  well-defined.  (There  the  emphasis  is  on 
definition:  there  is,  of  course,  an  extensive  philosophical  literature  that 
does  not  produce  a  mathematical  definition.  The  admirable  monograph  by 
Suppes  (1970)  is  the  best  reference:  a  more  recent  discussion  is  by  Toda 
(1977).)  One  definition,  that  is  used  in  experimental  design,  is  stated 
by  Kubln  (1974,  1978): 

"The  causal  effect  of  one  treatment  relative  to  unother  for  a 
particular  experimental  unit  is  the  difference  between  the  result 
If  the  unit  had  been  exposed  to  the  first  treatment  and  the  re¬ 
sult  if,  instead,  the  unit  had  been  exposed  to  the  second  treat¬ 
ment." 

This  is  fine  as  lar  as  it  goes  but,  as  Rubin  points  out,  it  cannot  be 
tested  directly  since  a  unit  typically  cannot  be  exposed  to  two  treatments. 

A  way  to  test  it  Is  to  use  "similar"  units,  some  having  one  treatment, 
some  another;  but  then  a  Judgment  of  similarity  is  Involved.  Such  a  Judg¬ 
ment  is  conveniently  expressed  in  terms  of  exchangeability,  as  Rubin  does. 
There  ts  a  link  between  our  ideas  and  causation  but  we  have  chosen  not  to 
explore  them  in  this  paper,  partly  because  it  would  make  the  paper  overlong, 
but  more  importantly  because  of  formidable  difficulties  of  definition.  Wo 


hope  th.it  our  suggestions  involving  exchangeability  a  ini  populations  will  bo 
of  some  help  in  formalizing  and  understanding  causation. 

Another  way  of  looking  at  Simpson's  paradox  is  through  correlation 
ideas.  Thus,  it  might  he  said  that  the  correlation  between  treatment  and 
recovery  Is  "spurious"  in  the  medical  case;  hut  that  between  their  agricul¬ 
tural  parallels,  variety  and  vield  Is  "real".  This  view  is  usefully  ex¬ 
plored  by  Simon  ( 1 9 5 •« )  who  distinguishes  between  the  two  types  of  correla¬ 
tion  using  linear  models  relating  the  three  variables;  models  which  would 
have  different  structures  in  the  medical  and  agricultural  cases.  These 
will  be  considered  below  when  discussing  variables  rather  than  events. 
Exchangeability  has  the  advantage  over  correlation  ideas  In  applying  to 
non-linear  situations. 

The  contrast  between  the  medical  and  agricultural  examples  shows  that 
there  can  be  no  unique  method  of  analyzing  the  data  of  Table  2.  The  infer¬ 
ences  in  the  two  cases  are  completely  different:  T  Is  better  in  the  medical, 
T  in  the  agricultural,  case.  Our  argument  is  that  the  reason  for  the  dif¬ 
ference,  and  hence  the  choice  of  the  appropriate  analysis,  can  easily  he 
appreciated  using  the  notion  of  exchangeabi 1 i t y ,  or  equivalently  that  ot 
subpopulations.  Another  advantage,  carefully  discussed  by  Kubiu  (19781, 
is  that  the  Bayesian  argument  is  considerable  simplified  when  the  treatment 
allocation  is  performed  using  a  random  mechanism. 
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It  has  been  pointed  out  in  Section  1  (and  in  the  appendix)  that 
the  paradox  arises  in  the  medic. il  example  because  treatment  and  sex 
have  been  confounded.  However,  this  confounding  does  not  affect  the 
agricultural  example,  where  the  obvious  interpretation  of  Table  1  is, 
as  we  have  seen,  the  correct  one.  These  ideas  are  connected  with  the 
roLe  of  randomization  in  experimental  design.  It  would  be  argued  that 
had  the  treatment  in  the  medical  case  been  assigned  at  random  the  paradox 
could  not  have  arisen.  This  is  in  agreement  with  the  view  adopted  here. 

A  mechanism  is  judged  random  by  You  if,  among  other  tilings.  You  consider 
that  the  mechanism  is  unconnected  with  any  other  factor.  With  such  a 
judgment  no  other  factor  such  as  sex  would  be  expected  to  disturb  the 
basic  interpretation  of  Table  1.  We  therefore  see  that  randomization 
can  play  an  important  role  even  in  the  personal ist ic ,  Bayesian  view  of 
inference.  This  is  contrary  to  the  opinion  resulting  from  the  basic 
theorem  in  decision  theory,  that  for  anv  randomized  decision  procedure 
there  exists  a  nonrandomized  one  which  is  not  worse  than  it,  to  the  ef¬ 
fect  that  randomization  is  unnecessary  in  the  Bayesian  approach.  The 
reason  for  the  difference  is  that  the  use  of  a  random  mechanism  is  not 
necessary,  it  is  merely  useful.  What  is  needed  is  a  judgment  of 


nonexistence  of  an  effect  confounded  with  treatment.  It  would  be  quite 
sensible  in  this  view  to  allocate  the  treatments  deliberately  and  thought¬ 
fully  so  that  the  allocation  appeared  to  possess  no  confounding  character¬ 
istics.  One  advantage  of  a  random  mechanism  is  that  most  people,  and  not 
just  You,  will  believe  it  to  be  random  and  hence  without  connection  to 
another  effect  such  as  sex. 

In  practice  scientists  do  not  allocate  completely  at  random:  instead 
they  obtain  a  random  allocation  from  the  mechanism  and  then  inspect  it 
for  any  unusual  features  before  using  it.  Thus  if,  in  the  random  selec¬ 
tion  of  a  Latin  square,  one  in  which  the  treatments  lay  down  the  diagonal 
was  obtained,  it  would  be  discarded  and  a  new  allocation  selected.  In 
other  words,  the  scientist  always  thinks  about  the  proposed  allocation 
before  using  it;  which  is  essentially  the  argument  here  —  use  an  allo¬ 
cation  which  You  think  is  free  from  confounding.  In  any  case,  it  is 
better  to  avoid  randomization,  as  far  as  possible,  by  blocking  with 
respect  to  any  factor  thought  to  influence  the  results;  randomization  is 
onlv  a  last  resort.  Notice  that  in  small  samples,  not  discussed  in  this 
paper,  an  allocation  found  by  a  random  mechanism  will  always  be  confounded 
with  some  effect:  one  can  do  no  better  than  what  the  personalistie  view 
suggests,  use  an  allocation  which  You  think  is  unlikely  to  have  important 
confounding  eftects.  In  the  agricultural  example  the  confounding  with 
height  Is  lriolevant  since  the  allocation  (of  variety)  influences  the 
height  and  joint  exchangeub ill ty  of  yield  and  height  is  reasonable.  Thus 
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ii  is  only  necessary  to  consider  effects,  such  as  sex,  which  exist  prior 
to  allocation  and  not  those,  such  as  height,  which  are  influenced  by  the 
assignment.  As  Lord  (1969)  points  out,  t lie  agricultural  experiment  is 
noninformat ive  about  the  yield  of  white  plants  made  to  grow  tall. 

Simpson's  paradox  is  related  to  the  sure-thing  principle  of  Savage 
(1962),  and  the  relation  has  been  explored  by  Ulvth  (1972)  and  by  others 
in  the  discussion  to  that  paper.  The  principle  says  that  if  act  f  is 
preferred  to  act  g  when  an  event  A  is  true,  and  also  when  A  is  false, 
then  f  is  preferred  to  g  when  You  are  uncertain  about  A.  The  medical 
case  is  apt:  T  is  preferred  to  T,  both  for  M  and  M,  and  therefore  for 
someone  of  unknown  sex.  The  agricultural  example  appears  to  violate 
the  principle.  The  resolution  lies  in  the  fact  that  there  the  choice 
of  act  --  black  or  white  variety  --  is  no  longer  available  to  You  if 
A,  a  tall  plant,  is  true.  Consequently  the  premises  of  the  principle 
are  not  correct.  The  principle  might  apply  in  Lord's  case  of  conditions 
in  which  plants  are  made  to  grow  tall.  Again  the  notion  of  exchangeability 
conveniently  captures  the  essence  of  the  distinction. 

In  Section  4  we  discussed  the  choice  of  the  appropriate  probability. 
The  same  point  arises  with  three  events.  In  the  two  examples  of  Simpson's 
paradox  the  appropriate  one,  p(R|T),  is  available  directly.  An  extension 
of  the  disease/test  example  of  Section  3  will  illustrate  the  point  more 
forcefully.  Suppose,  in  addition  to  D,  D,  and  +,  the  sex  was  also 
recorded.  Then  the  judgment  of  exchangeability  might  be  in  respect  of 
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test  result  given  sex  and  disease  class.  Hence  p(+|DM)  etc.  would  be 
available  from  the  data  propensities,  whereas  the  quantities  required 
would  be  p(D|+M)  etc.  for  someone  of  known  sex,  or  p(D|+)  if  that  is 
unknown.  By  Bayes  formula 

p(D|+M)  u  p(+|DM)p(D|M) 

and  the  first  factor  is  available,  but  the  second,  p ( D | M )  would  need  to 
be  assessed  by  other  methods.  The  evaluation  of  p(Dj+)  would  proceed 
as  for  p(RlT)  above  by  extension  of  the  conversation  to  include  sex. 

Two  further  points  are  worth  making  before  passing  to  more  general 
random  variables  than  events.  First,  it  should  be  noted  that  even  with 
the  full  data  of  Table  3  in  the  medical  example  the  treatment  T  might 
still  be  preferable  to  the  control  T,  even  with  the  exchangeability  as¬ 
sumption  already  made.  For  example,  there  could  exist  another  dichotomy, 
say  rural  and  urban,  which  would  reverse  the  difference  again.  Thus  for 
any  combination  of  sex  and  urbanization,  the  treatment  might  give  the 
preferred  recovery  rate. 

The  second  point  leads  on  from  this.  Many  sciences  are  observational 
and  not  experimental;  sociology,  for  example.  In  these  cases  factors  can¬ 
not  always  be  selecteu  in  such  a  way  that  You  expect  no  confounding. 
Consequently  it  is  sometimes  dangerous  to  mal  eductions  from  observa¬ 
tional  data  and  conclude  that  these  will  hold  for  controlled  data. 

Fisher  (1958)  made  this  point  in  connection  with  lung  cancer,  arguing 
that  the  observed  association  :ith  smoking  might  not  hold  if  smoking  was 
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controlled  because  there  might  exist  a  factor,  he  suggested  a  genetic 
one  which  played  the  role  of  sex  in  our  example,  that  created  a  spurious 
association.  Another  instance  of  this  might  be  provided  by  the  same  data 
set  as  in  Table  2  with  varieties  replaced  by  racial  classification,  yield  by 
intelligence,  and  sex  by  social  class.  The  white  people  would  appear  more  in¬ 
telligent  than  the  black  but  this  might  be  due  to  confounding  with  social  class. 
Yet  this  might  (or  might  not)  be  again  reversed  by  confounding  with  some 
other,  unknown  factor.  Observational  material  are  themselves  inadequate 
in  situations  like  this;  some  judgment  of  exchangeability  is  essential  in 
such  cases.  The  possibility  of  stronger  judgment  of  exchangeability  in 
the  case  of  designed  experiments  as  against  observational  data  is  one  way 
of  accounting  for  the  superiority  of  the  former  type  of  data  collection 
over  the  latter. 


6.  Three  Random  Variables 

We  now  pass  from  the  consideration  of  three  events  to  look  at  situa¬ 
tions  where  one  or  more  of  the  events  are  replaced  by  general  random  vari¬ 
ables.  Consider  first  the  agricultural  example  of  Simpson's  paradox  with 
the  high  or  low  yields  replaced  by  Y,  a  random  variable  measuring  the 
yield  in,  say,  tons  per  acre.  Table  4  provides  an  example.  It  is  de¬ 
rived  from  Table  3  by  multiplying  the  propensities  there  by  40  (to  avoid 
fractions)  and  calling  them  expectations.  Thus,  for  M  and  T,  p(R|MT)  =  0.6 
giving  Y  =  24.  In  each  cell  n  refers  to  the  number  of  observations.  Again 


25 


Insert  Table  4  about  here 


n  might  be  multiplied  by  some  large  number  and  Y  Identified  with  expecta¬ 
tions,  such  as  E(Y|MT).  The  paradox  arises  since  E(Y|MT)  <  E(Y|MT),  and 
similarly  with  M,  yet  E(Y|T)  >  E ( Y | T) ;  and  is  due  to  the  confounding  be¬ 
tween  M  and  T.  Merely  displaying  the  result  is  this  tabular  form  suggests 
analysis  of  variance  techniques  and  in  the  language  of  that  area:  there 
are  main  effects  of  both  factors  and  a  pronounced  interaction.  In  the 
agricultural  version  of  Table  4,  the  judgment  is  of  exchangeability  in 
Y  (yield)  and  M  (height)  given  T  (variety),  so  that  only  the  main  effect 
of  variety  is  important  in  considering  a  new  plot.  With  the  medical  situa¬ 
tion  the  exchangeability  is  in  respect  of  Y  (which  might  be  a  measure  of 
recovery,  say  increase  in  blood-cell  count)  given  M  (sex)  and  T  (treat¬ 
ment).  Here  the  interaction  is  relevant  and  the  important  feature  that 
carries  over  to  a  new  patient  is  the  conditional  distribution  of  Y  given 
M  and  T,  and  the  usual  breakdown  into  main  effects  and  interaction  is  of 
limited  use.  This  emphasizes  again  the  point  made  earlier  that  there  can 
be  no  unique  analysis  of  data  without  consideration  of  the  new  unit  to 
which  the  inference  is  to  be  applied.  Notice  that  had  the  design  been 
balanced  with  n  =  20  in  each  cell  the  main  effect  of  treatment  would  have 
agreed  with  that  for  each  sex  separately. 


The  assumption  of  exchangeability  on  its  own  is  not  enough  for  valid 


inferences.  For  example,  in  a  randomized  block  design  with  treatments 
T.  and  blocks  11  giving  yields  Y,  the  exchangeability  Is  for  Y  given  T 


and 


B.  This,  by  itself,  gives  no  guide  to  treatments,  p (Y | T  ) .  Usually 


one  assumes  that  yield  differences  AY  for  two  treatments,  and  T^,  are 
independent  of  B  so  that  p(AY|T^,  T  ,  B) ,  available  by  exchangeability, 
reduces  to  the  required  p(AY|t^,  T  ).  This  is  the  assumption  of  additivity. 

Suppose  next  that  In  addition  to  Y,  the  nuisance  factor,  sex  or  height, 
is  also  a  continuous  random  variable,  X  sav.  The  agricultural  situation 
again  provides  an  example  with  X  as  height.  The  paradox  arises  whenever 
F.  (Y  |  X,  T)  <  K(Y|X,  T) 


for  all  X ,  a nd  yet 

E(Y|t)  >  E(Y|T)  . 

This  is  clearly  possible  even  within  the  restricted  context  of  linear  re¬ 
gression  with  fixed  slopes.  For  suppose 

E(Y|x,  T)  -  a  +  BX  and  E(Y[x,  T)  -  a  +  BX 
and  a  <  a.  We  then  have 

E(Y|r)  -  a  +  Bp  and  E(Y|t)  -  a  +  Bu 
with  p  -  E(X|T),  p  ■  E (X | T) .  The  paradox  arises  if  (p  -  p)  >  (a  -  a)/B. 
assuming  B  >  0. 

Just  as  the  previous  situation  was  concerned  with  the  analysis  of 
variance,  this  case  is  handled  using  covariance  ideas.  There  is  a  substan¬ 
tial  literature,  see  for  example,  Lord  (1967)  and  Elashoft  (llbd)  on  when 
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analysis  of  covariance  is  appropriate.  Again  considerations  of  exchange¬ 
ability  clarify  the  picture.  If  exchangeability  in  Y  given  X  and  treat¬ 
ment  is  appropriate  as  in  the  medical  situation,  the  propensities  pro¬ 
vide  for  p (Y | X .  T)  and  in  particular  F(Y|x,  T) .  The  required  expectation 
is 

K (Y | T)  -  /E(Y|X,  T)p(x|T)UX 
»  a  +  B  E(X|T) 

on  assuming  linearity.  But  E(Xj'l’)  is  not  available  from  the  data  and 
hence  the  covariance  adjustment  is  essential.  On  the  other  hand,  in  the 
agricultural  case  Y  and  X  are  exchangeable  given  the  variety  and 
P(Y,  X | T)  is  available  from  the  data.  In  particular  so  is  the  marginal 
expectation  E(Y|T)  and  the  covariance  adjustment  is  unnecessary.  It  is 
often  said  that  the  covariate  must  not  be  associated  with  the  treatment 
The  examples  show  that  this  is  false.  Notice  also  that  the  discussion 
does  not  involve  considerations  of  normality,  etc. 

We  now  discuss  an  example  from  educational  testing  where  the  need 
for  a  covariance  adjustment  is  none  too  clear  but  where  exchangeability 
resolves  the  issue.  Indeed,  it  was  this  case  that  started  us  on  the 
whole  discussion.  An  experiment  was  designed  to  investigate  the  effect¬ 
iveness  of  one  instructional  method  T  in  comparison  with  the  standard 
method  T.  Two  groups  were  chosen,  one  taught  by  T,  and  other  by  T.  The 
students  were  then  given  a  test  (called  the  posttest)  and  their  scores  Y 
were  recorded.  Since  the  two  groups  m.  v  have  had  different  abilities,  a 
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pretest  was  also  given  resulting  in  scores  X.  The  problem  would  appear 
to  be  essentially  the  same  as  the  medical  one  in  which  X,  replacing  sex, 
and  T  influence  Y  so  that  exchangeability  is  in  Y  given  X  and  T,  and  the 
covariance  adjustment  for  x  Is  necessary.  We  suggest  this  is  not  reason¬ 
able.  For  suppose  You  had  pretest  value  X  =  x,  would  You  consider  yourself 

exchangeable  with  those  who  took  part  in  the  test  and  had  score  x?  We 

suggest  not,  because  X  Is  well  known  to  depend  on  the  group  to  which  the 
student  belongs:  a  value  x  In  a  strong  group  is  probably  more  indicative 
of  ability  than  x  in  a  weak  group.  What  You  might  do  Is  to  consider  Your- 

selt  exchangeable  with  those  students  in  the  test  having  the  same  pretest 

trui-score  as  Yourself.  But  true  scores  have  not  been  measured  and  so 
arc  not  available  from  the  data.  The  analysis  can  proceed  as  follows, 
all  expectations  being  for  the  unit.  You,  and  t  denoting  the  true  score. 
E(Y|T,  X)  -  E(E(Y |T.  X,  t)|T,  X) 

-  e(e(y|t,  t)|t,  X) 

-  E{a  +  Bt|T,  X} 

=  a  +  BF.(t|t,  X) 

assuming  linearity  of  regression.  Similarly 
E(Y | T ,  X)  -  a  +  SE(t|T,  X)  . 

Now  E(t|t,  X)  «  p  +  pX  and  E ( r | T ,  X)  "  u  +  pX  where  p  is  the  reliability 
of  the  test.  Hence 

E(Y|T,  X)  »  a  +  Bp  +  BpX 

and 
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E(Y|T,  X)  -  a  +  Up  +  ftpX  . 

Hence  the  test  is  preferred  if  a  +  8p  >  a  +  8p ;  not  necessarily  if  a  >  a. 
Ho.ice  the  covariance  interpretation  using  n  and  a  is  not  the  correct  one. 

The  same  conclusion  persists  without  a  pretest  since  presumably 
E(x|'D  »  E(X (l) ,  the  method  being  applied  after  the  pretest. 

Nothing  essentially  new  happens  when  the  third  factor,  previously 
T  and  T,  becomes  continuous,  Z  say.  We  can  have  K(Y|x,  7.)  say  inc teasing 
in  Z  for  all  X,  so  that  large  values  of  7.  are  to  be  preferred,  and  yet 
E(Y|z)  is  decre  using  in  Z  suggesting  small  values.  Again  linear  multiple 
regression  provides  an  example  with 
E(Y|X,  Z)  -  aX  +  HZ 
and  8  N  0,  yet 

E ( Y | Z )  -  aE(X | Z)  +  87.  -  n(n  +  oZ)  +  HZ 
witli  an  f  H  v  0.  The  consideration  ol  exchangeability  and  calculation  oi 
the  a ppropr ia t e  probability  together  resolve  the  problem. 

With  X,  Y  and  Z  continuous  and  linear  relations  obtaining,  the  analysis 
of  Simon  (19S4)  previously  referred  to  may  he  employed.  In  the  medical 
example,  the  sex.  X,  aliened  the  treatment,  Z,  both  of  which  affected  the 


recovery 

.  Y. 

In  the 

agricultural  situation,  the  variety  Z,  af looted  both 

the  height,  X, 

and  the  yield  Y 

.  The  respective  linear  models  ate  (medical) 

all 

X 

V'l 

“21 

X  +  a 

22  Y  " 

a23Z  . 

"2 

*31 

X 

+ 

'*31  7 

UJ 
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and  (agricultural) 

all  X  +  a31  Z  “  U1 

a?2  Y  z  “  u., 

a33  Z  =  u3 

Here  the  u's  are  error  terms  and  the  a's,  non-zero  constants.  Direct  cal¬ 
culation  shows  that  the  paradox  can  obtain  and  the  revelant  inferences  made. 
Our  approach  avoids  the  restriction  to  linearity. 

7.  Conclusion 

We  have  argued  that  the  basic  process  of  inference  is  the  passage  of  a 
data  set  to  uncertainty  statements  about  another  unit,  as  exemplified  by 
"the  probability  that  John  will  recover  if  he  is  given  treatment  T  is  0.6". 
The  introduction  of  parameters,  the  usual  subject  of  inference  statements, 
may  often  be  a  most  useful  device,  but  is  not,  in  our  view,  essential. 

Once  this  view  of  inference  is  adopted,  one  sees  that  an  important  aspect 
of  the  inference  is  the  linkage  between  the  data  and  the  new  unit.  We 
have  argued  that  this  linkage  can  be  formulated  in  terms  of  judgments  of 
exchangeability  between  the  unit  and  the  data;  or,  alternatively  expressed. 
Judgments  of  which  subpopulation  the  unit  belongs  to.  (In  this  paper  we 
have  confined  ourselves  to  large  data  sets,  and  hence  to  large  populations. 
Additional  complexities  arise  with  smaller  data  sets  and  considerations  of 
finite  exchangeability  that  it  would  need  another  paper  to  explore.  Never¬ 
theless  the  consideration  of  what  one  might  do  with  the  population  is  a 
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prerequisite  to  considerations  of  Inference  with  a  small  data  set.  Our 
discussion  of  covariance  analysis  illustrates  this.)  Once  the  linkage  is 
established,  frequencies  (or  propensities)  in  the  identified  subpopulat ions 
may  be  equated  with  the  corresponding  probabilities  for  the  new  unit.  A 
final  point  is  that  the  required  probability  may  not  be  obtainable  directly 
in  this  way  and  that  other  information  besides  that  in  the  data  may  be 
needed  to  combine  with  that  originating  from  the  data  to  make  the  final 
inference. 

Once  it  is  recognized  that  inference  involves  the  passage  from  a  data 
set  to  a  new  unit,  it  is  clear  that  there  is  no  unique  analysis  of  a  data 
set;  for  it  is  possible  to  imagine  two  units,  linked  in  quite  different 
manners,  to  the  set.  Thus  the  data  of  Table  3,  supposed  from  a  city,  might 
be  applied  in  one  way  (Joint  exchangeability  of  disease  and  test  result) 
to  another  person  from  the  same  city;  but  otherwise  (exchangeability  in  test 
result  given  D)  for  someone  from  a  different  environment. 

In  applying  the  ideas  of  this  paper  it  is  first:  necessary  to  consider 
the  unit  about  which  inferences  are  to  be  made.  What  do  You  know  about  the 
unit?  Its  sex,  M,  for  example.  What  features  of  the  unit  can  be  controlled? 
The  treatment,  T,  say.  What  feature  is  of  interest?  Its  recovery,  R,  perhaps. 
Then  You  need  to  calculate  Your  probability  of  what  is  of  Interest,  given 
what  you  know  and  can  control:  here  p(R|M,  T).  The  only  tool  available  is 
the  probability  calculus,  principally  Bayes  theorem  and  extension  to  include 
other  variables,  relating  the  required  probability  to  others.  Which  others 
are  used  depends  on  the  connection  with  the  data. 


In  our  experience,  it  is  generally  fairly  easy  to  make  the  appropriate 
judgments  of  exchangeability,  or  to  recognize  the  relevant  populations. 
Sometimes  it  is  necessary  to  include  other  variables:  for  example,  true 
score  in  the  educational  example  o!  section  6.  A  useful  guide  is  the 
notion  of  causality,  of  which  another  useful  guide  is  the  temporal  order: 
varietal  choice  later  produces  height  and  yield;  but  sex  and  treatment 
later  effect  recovery.  The  important  point  to  recognize  is  that  exchange¬ 
ability  is  a  judgment  by  You,  not  a  property  of  the  external  world.  In 
this  view,  causation  is  a  reflection  of  ou:  judgment  about  the  world  and 
not  a  truth  about  it.  In  the  present  state  of  knowledge  we  m.iv  say  smoking 
causes  lung  cancer,  yet  later  we  may  revise  this  to  say  that  a  genetic 
factor  causes  both. 

It  is  important  to  recognize  that  the  methods  described  in  this  paper 
do  not  only  apply  to  situations  in  which  it  has  been  possible  to  take  random 
samples  from  a  population.  Of  course,  if  this  has  been  done  then  complete 
exchangeability  is  available  and  propensities  may  be  identified  with  proba¬ 
bilities.  But  if  not,  then  recognition  of  subpopulations  enables  some 
partial  identifications  of  probabilities  and  propensities  to  be  made,  and 
the  remaining  probabilities — about  which  the  data  is  unlnformat ive — have  to 
be  assessed  directly.  We  had  an  example  of  this  in  Section  4  where  p(D)  had 
to  be  found  from  sources  other  than  the  data.  It  is  a  useful  contribution 
to  our  understanding  of  a  situation  to  be  able  to  spell  out  clearly  just 
what  it  is  that  the  data  tell  us,  and  what  has  to  he  inferred  by  other  means, 
in  order  to  make  the  final  inference. 
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Appendix  1.  Simpson's  Paradox 

Consider  the  paradox  in  the  notation  of  the  paper  referring  to 
events  R,  T,  and  M.  Without  loss  of  generality  suppose  T  and  R  are 
positively  associated;  that  is,  p(R|T)  >  p(R|T),  and  write  T  a.  R . 

Similarly  suppose,  again  without  loss  of  generality,  R  a,  M.  We  prove 
the  following  result  which  does  not  seem  to  be  available  elsewhere: 

THEOREM 

If  Simpson's  paradox  holds  (with  T  R  and  R  %  M),  then  T  ■v  M. 

(Table  2  provides  an  illustration  of  this.  In  words  it  says  that  the 
new  factor  M,  must,  with  the  conventions  here  adopted,  be  positively 
associated  both  with  R  and  T.) 

The  proof  is  most  easily  appreciated  using  the  Figure,  the  upper 
unit  interval  gives  probabilities  conditional  on  T,  the  lower  on  T.  The 
arrows  connect  probabilities  having  the  same  conditions  except  for  T  re¬ 
placing  T.  The  essence  of  the  paradox  is  that  those  arrows  that  involve 
sex  go  to  the  right;  those  that  do  not,  go  to  the  left.  (We  have  sup¬ 
posed,  again  without  loss  of  generality  that  p(R|TM)  >  p ( R | T ) . )  The  key 
point  is  that  p(R|T)  is  a  weighted  average  of  p ( R | TM )  and  p(R|TM)  with 
weights  p(M|T)  and  p(M|T)  =  1  -  p (M  |  T ) .  For  the  reversal  of  direction  of 
the  arrows  to  take  place  when  sex  is  excluded  the  weights  in  the  upper  in¬ 
terval,  given  T,  must  differ  from  those  in  the  lower,  given  T.  In  the 


Insert  Figure  about  here 
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Figure  p(R|T)  is  nearer  to  the  upper  right-hand  p(R|TM)  than  p(R|T)  is 
greater  than  that  on  p(R|TM);  that  is,  p(m|t)  exceeds  p(M|T)  as  was 
required . 
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Appendix  11.  A  Note  on  Exchangeability  in  Two  Variables 

If  a  set  of  units  is  exchangeable  in  (X,Y)  then  it  is  both  exchange¬ 
able  in  Y,  and  in  X,  given  Y.  This  is  clear  from  the  defining  relationship 
for  exchangeability  in  (X,Y),  namely  that  pCXj'x^,  Y^-y^ ,  all  i)  be  invariant 
under  relabelling  of  the  units,  by  writing  it  as  pfX^-x^,  all  i  I Y ^  D 

ptY^-y^,  all  i) ,  and  considering  the  special  case  Y^«*y,  all  i.  The  converse 
of  the  statement  in  the  first  sentence  is  however  not  true.  This  is  apparent 
since  conditional  exchangeability  says  nothing  about  probabilities  of  X-values 
given  Y-values,  except  when  the  latter  are  all  the  same,  and  this  is  not 
enough  to  construct  the  defining  relationship  for  exchangeability  in  X  and  Y. 

These  considerations  suggest  an  alternative  definition  of  exchange¬ 
ability  in  X,  given  Y,  to  that  given  in  the  body  of  the  paper.  This  reads: 
a  set  of  units  is  exchangeable  in  X,  given  Y,  if  pfX^-x^,  all  *) 

is  invariant  under  relabelling  of  the  units.  It  is  obvious  on  multiplying 
this  by  p(Yi-yi,  all  i)  that  the  converse  is  now  true.  We  have  used  the 
(weaker)  definition  of  conditional  exchangeability  because  that  is  all  that 
is  needed  to  equate  the  propensity  with  the  probability  for  a  new  unit.  If 
one  wishes  to  make  inferences  about  several  new  units  then  the  extended 
definition  would  be  useful.  To  see  this  consider  two  new  patients,  H(enry) 
and  J(ohn),  who  could  be  given  either  T  or  T.  (We  know  them  to  be  male  and 
this  condition  is  omitted  from  the  notation.)  To  make  inferences  about 
their  recovery  we  require  probabilities  exemplified  by  p(HR,JR|HT,JT ) , 
where  MR  means  Henry  recovers  etc.  The  new  definition  would  enable  this  to 
be  equated  to  a  propensity.  However  we  presumably  Judge  it  to  be  equal  to 
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p(HR|HT)p(JRj JT )  and  then  the  weaker  form  suffices.  This  condition  is 
related  to  the  assumption  of  "no  interference  between  units"  referred  to 
by  Rubin  (1978). 
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